AllRecipes are not created equal

Joanna Purich

December 14, 2018

Standard Recipe

Motivation

This tutorial serves as the final project for CMSC641 at University of Maryland.

Introduction

AllRecipes is a website that allows users to share their recipes and discover new receipes that other users have shared. Users can indicate that they attempted the recipe, rate it and provide comments.

The objective of this tutorial is to discover how ingredients contribute to the success of a recipe. Along the way, we will examine other factors such as cook time and preparation steps and evaluate their contribution to a recipe's rating. We will scrape data directly from the AllRecipes site, build a sqlite database to store the metadata for each recipe, clean the data, and draw out insights. The majority of this tutorial will focus on data acquisition, storage and and processing.

Required Tools

In order to execute this tutorial you will need a Python 3 installation. We suggest installing using Anaconda.

You will also need the following Python packages:

  • pandas
  • seaborn
  • requests
  • sqlite3
  • BeautifulSoup
  • sklearn
  • networkx
  • matplotlib
  • time (should come pre-installed with Python)
  • re (should come pre-installed with Python)

We will also make use of the NYTimes Ingredient-Phrase-Tagger to clean the freeform ingredients section of each recipe. At the moment of writing, this project isn't fully Python3 compatible due to print statements missing parenthesis. I've submitted a pull request to correct. There are also Pandas warnings that need to be addressed, but they do not currently break the code.

Before progressing, you will need to clone the repo to the same directory as this project and make sure that you have crf++. Please follow the directions they provide.

Scraping

AllRecipes no longer provides free access to their Recipes API. Therefore, we will need to scrape data directly from their site using Requests and BeautifulSoup. We will need to crawl every recipe in the AllRecipes domain and parse the HTML to find elements that contain data of interest. A quick check of the site's robots.txt file shows us that this usage is allowable, but that we will need to limit requests to 1 per second.

To start with, we will need to compile a list of URLs for each recipe that we want to capture. The homepage features an infinite scroll of recipe cards, so we can paginate through each set of cards and record the recipe URL. Manually, using binary search I discovered the "last page" of recipe results and recorded the page number so that I knew when to stop paging through results.

In [1]:
import requests
import time
import csv
from bs4 import BeautifulSoup

f = open('allrecipe_urls.csv', 'a+')
out = csv.writer(f)

base_url = "http://allrecipes.com/recipes/?sort=Title&page="

for page in range(1,3240):
    time.sleep(1)
    
    url = base_url + str(page)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    htmlUrls = soup.find_all('article', {'class': "fixed-recipe-card"})
    processed = [[i.find('a').get('href')] for i in htmlUrls]

    out.writerows(processed)

f.close()

As of November 27, 2018, there were 64,583 active recipes which suggests it will take approximately 18 hours to scrape each recipe. However, I found that this process took closer to 5 days due to connection loss and unforseen issues.

Because the data is expensive to obtain from a time perspective, it is important that it is written to disk and preserved for later use. For this implementation, we've chosen to use a local Sqlite database due to the variable length nature of the data; recipes can have 0-n ingredients and 0-m steps. If we used a flat text file, we would have to commit to a maximum amount of ingredients/steps ahead of time, before we even visited any URLs.

The AllRecipes Database will consist of four tables: recipes, directions, nutrition and ingredients; which describe the four types of data we will be harvesting. The recipes table will include metadata about the recipe such as the author, number of reviews, description and prep time. The directions table will have a row for each preparation step for each recipe. Similarly, the ingredients table will have a row for each ingredient in each recipe. Finally, the nutrition table will have a row for each recipe (that has nutrition facts listed) and will contain data such as the calorie count and fat content, per serving.

We will now initialize the database and define the tables listed above. First, we will define some helper functions that will format and create the tables and functions that we will use later to insert rows into each table. In the second cell, we will call the create_db() function to build the database.

In [2]:
import sqlite3

def create_conn(db):
    try:
        conn = sqlite3.connect(db)
        return conn

    except Exception as e:
        print(e)

    return None

def create_table(conn, table_sql):
    try:
        cur = conn.cursor()
        cur.execute(table_sql)

    except Exception as e:
        print(e)

def create_db(db):
    sql_recipe = """ CREATE TABLE IF NOT EXISTS recipes (
                                    id integer PRIMARY KEY,
                                    url text NOT NULL,
                                    title text,
                                    author text,
                                    description text,
                                    num_photos integer,
                                    prep_time text,
                                    cook_time text,
                                    total_time text,
                                    rating real,
                                    reviews integer,
                                    made_it integer,
                                    servings integer,
                                    num_steps integer,
                                    num_ingredients integer
                                ); """
 
    sql_directions = """CREATE TABLE IF NOT EXISTS directions (
                                    id integer PRIMARY KEY,
                                    recipe_id integer NOT NULL,
                                    step text NOT NULL,
                                    step_order integer NOT NULL,
                                    FOREIGN KEY (recipe_id) REFERENCES recipes(id)
                                );"""

    sql_nutrition = """CREATE TABLE IF NOT EXISTS nutrition (
                                    id integer PRIMARY KEY,
                                    recipe_id integer NOT NULL,
                                    calories real,
                                    fat real,
                                    carbs real,
                                    protein real,
                                    cholesterol real,
                                    sodium real,
                                    FOREIGN KEY (recipe_id) REFERENCES recipes(id)
                                );"""

    sql_ingredients = """CREATE TABLE IF NOT EXISTS ingredients (
                                    id integer PRIMARY KEY,
                                    recipe_id integer NOT NULL,
                                    ingredient text NOT NULL,
                                    FOREIGN KEY (recipe_id) REFERENCES recipes(id)
                                );"""
 
    conn = create_conn(db)

    if conn is not None:
        create_table(conn, sql_recipe)
        create_table(conn, sql_directions)
        create_table(conn, sql_nutrition)
        create_table(conn, sql_ingredients)
    
    conn.close()

def update_recipe(cur, summary, recipe_id):
    sql = """UPDATE recipes
                SET title = ?,
                author = ?,
                description = ?,
                num_photos = ?,
                prep_time = ?,
                cook_time = ?,
                total_time =?,
                rating = ?,
                reviews = ?,
                made_it = ?,
                servings = ?,
                num_steps = ?,
                num_ingredients = ?
                WHERE id = ?
                """
    values = summary + (recipe_id,)

    cur.execute(sql, values)

def insert_directions(cur, directions, recipe_id):
    sql = """ INSERT INTO directions (recipe_id, step_order, step)
                VALUES (?, ?, ?)"""
    for i in directions:
        values = (recipe_id,) + i
        cur.execute(sql, values)

def insert_ingredients(cur, ingredients, recipe_id):
    sql = """ INSERT INTO ingredients (recipe_id, ingredient)
                VALUES (?, ?)"""
    for i in ingredients:
        values = (recipe_id,) + i
        cur.execute(sql, values)
        
def insert_nutrition(cur, nutrition, recipe_id):
    sql = """ INSERT INTO nutrition
                (recipe_id, calories, fat, carbs, protein,
                cholesterol, sodium)
                VALUES (?, ?, ?, ?, ?, ?, ?)"""
    values = (recipe_id,) + nutrition
    cur.execute(sql, values)

The cell below creates the database.

In [3]:
create_db('allrecipes.db')

Now that we have the database setup, we will insert the URLs that we've already collected.

In [4]:
import sqlite3

f = open('allrecipe_urls.csv', 'r')
reader = csv.reader(f)

conn = db_builder.create_conn('allrecipes.db')
sql_urls = """INSERT INTO recipes (url) VALUES (?);"""

for url in reader:
    cur = conn.cursor()
    cur.execute(sql_urls, (url[0],))
    
conn.commit()
conn.close()

We have created the database and we have functions to insert data, but we still need code to process the HTML and capture the tags that we are interested in. Once again, we will utilize helper functions to modularize the code.

Each function will take as input a Beautiful Soup HTML object. The function will then use the find or find_all method to locate the HTML tag of interest. Finally, each will return a tuple output. We use tuples here because they are the data structure that Sqlite expects. In some places we need to use try/except statements because the html tag is not always present. For example, in the case that an author does not include a description for the recipe, there is no description tag present in the HTML.

In [5]:
def get_ingredients(soup):
    ingred = [i.text.strip() for i in
              soup.find_all('li', {'class': 'checkList__line'})][0:-3]
    return tuple((i,) for i in ingred)

def get_directions(soup):
    steps = [i.text.strip() for i in
             soup.find_all('span', {'class': 'recipe-directions__list--item'})]
    steps = list(filter(None, steps))

    return tuple(zip(range(len(steps)), steps))

def get_nutrition(soup):
    nutrition = soup.find('div', {'class': 'nutrition-summary-facts'})
                
    calories = nutrition.find('span', {'itemprop': 'calories'}).text.strip(' calories;')
    fat = nutrition.find('span', {'itemprop': 'fatContent'}).text.strip()
    carbs = nutrition.find('span', {'itemprop': 'carbohydrateContent'}).text.strip()
    protein = nutrition.find('span', {'itemprop': 'proteinContent'}).text.strip()
    cholesterol = nutrition.find('span', {'itemprop': 'cholesterolContent'}).text.strip()
    sodium = nutrition.find('span', {'itemprop': 'sodiumContent'}).text.strip()
    
    return (calories, fat, carbs, protein, cholesterol, sodium)

def get_basics(soup):
    title = soup.find('title').text.split(' - ')[0]
    num_photos = soup.find('span', {'class': 'picture-count-link'}).text.strip(' photos')
    author = soup.find('span', {'class': 'submitter__name'}).text
    
    summ_stats = soup.find('div', {'class': 'recipe-summary__stars'})
    rating_long = summ_stats.find('div', {'class': 'rating-stars'}).attrs['data-ratingstars']
    num_reviews = summ_stats.find('meta', {'itemprop': 'reviewCount'}).attrs['content']
    
    made_it = soup.find('span', {'class': 'made-it-count'}).find_next().text.strip('\xa0made it')
    servings = soup.find('meta', {'id': 'metaRecipeServings'}).attrs['content']
    
    try:
        description = soup.find('div', {'class': 'submitter__description'}).text.strip()
    except:
        description = None
    
    try:
        prep_time = soup.find('time', {'itemprop': 'prepTime'}).attrs['datetime'].strip('PT')
    except:
        prep_time = None
    try:
        cook_time = soup.find('time', {'itemprop': 'cookTime'}).attrs['datetime'].strip('PT')
    except:
        cook_time = None
    try:
        total_time = soup.find('time', {'itemprop': 'totalTime'}).attrs['datetime'].strip('PT')
    except:
        total_time = None
    
    return (title, author, description, num_photos, prep_time,
               cook_time, total_time, rating_long, num_reviews,
               made_it, servings)

We will define a function that will execute each of the helper functions and combine their output.

In [6]:
def fetch_data(soup):
    
    basics = get_basics(soup)
    directions = get_directions(soup)
    ingredients = get_ingredients(soup)

    basics_ext = basics + (len(directions), len(ingredients))

    return (basics_ext, directions, ingredients)

At this point we have code to process the BeautifulSoup and capture the page data that we're interested in. However, we still need to write code to visit each URL, call the processing functions, and then call the database functions to insert the data.

In [7]:
def url_to_soup(url):
    r = requests.get(url)
    return BeautifulSoup(r.text, 'html.parser')
In [8]:
def process_html(recipe_id, conn, soup):
    try:
        summary, directions, ingredients = fetch_data(soup)
        
        cur = conn.cursor()
        
        update_recipe(cur, summary, recipe_id)
        insert_directions(cur, directions, recipe_id)
        insert_ingredients(cur, ingredients, recipe_id)
        
        try:
            nutrition = get_nutrition(soup)
            insert_nutrition(cur, nutrition, recipe_id)
            
        except Exception as e:
            print('No nutrition elements: ', recipe_id)
            print(e)
            
        conn.commit()
        
    except Exception as e:
        conn.rollback()
        print(e, recipe_id)

Now we are finally ready to put it all together and crawl AllRecipes. The following code is written such that it can be paused and restarted without losing data or revisiting recipes that have already been crawled.

First, we query the database to obtain the URLs of all recipes that do not have metadata and then iterate through each URL. The query returns URLs in a random order so if we do not have time to collect every recipe, we can select a random subset. For each URL we use the url_to_soup function to obtain the HTML and then process_html to locate the elements of interest and insert the data into our database. Once again, we utilize Exception handling to handle URLs that are no longer active or a faulty connection.

In [9]:
missing_recipes = """SELECT id, url FROM recipes WHERE title is null ORDER BY random()"""

conn = create_conn('allrecipes.db')
cur = conn.cursor()
cur.execute(missing_recipes)

for row in cur:
    recipe_id, url = row
    time.sleep(1)
    
    try:
        soup = url_to_soup(url)
        process_html(recipe_id, conn, soup)
        
    except Exception as e:
        print(e, row)

When this code was last executed, we were only able to successfully scrape 64,368 recipes, missing 215. When we manually visited the URLs that we couldn't capture we noted that some resulted in 404 errors because the recipe had been deleted, but the majority were active. However, the styling of the page was very different from the successfully captured URLs. The scraper failed for these pages because the page elements of interest were relocated in the HTML. Below is an example of an outlier recipe.

Instead of writing a second set of code to scrape these outlier pages, we decided to leave them out.

Outlier Recipe

Data Processing

From this point on, we will be using Pandas and Numpy to clean up our data and prepare it for exploratory analysis. We will read the recipes metadata into a dataframe and take a cursory glance to identify areas in need of further processing. Later, we will take a look at ingredients, directions and nutrition.

In [10]:
import pandas as pd

conn = create_conn('allrecipes.db')
# only pull metadata for recipes we successfully crawled
sql = 'SELECT * FROM recipes WHERE title is not null;'
recipes = pd.read_sql(sql=sql, con=conn)

Most of the data that we collected is self-explanatory, but I will note that made_it represents the number of users that self-reported attempting to cook the recipes.

In [11]:
recipes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64426 entries, 0 to 64425
Data columns (total 15 columns):
id                 64426 non-null int64
url                64426 non-null object
title              64426 non-null object
author             64426 non-null object
description        64426 non-null object
num_photos         64426 non-null object
prep_time          57845 non-null object
cook_time          47689 non-null object
total_time         58200 non-null object
rating             64426 non-null float64
reviews            64426 non-null int64
made_it            64426 non-null object
servings           64426 non-null int64
num_steps          64426 non-null int64
num_ingredients    64426 non-null int64
dtypes: float64(1), int64(5), object(9)
memory usage: 7.4+ MB
In [38]:
len(recipes[recipes.made_it.str.contains('k') == True]['made_it'])
Out[38]:
1360
In [12]:
recipes.head(1)
Out[12]:
id url title author description num_photos prep_time cook_time total_time rating reviews made_it servings num_steps num_ingredients
0 1 https://www.allrecipes.com/recipe/222980/1970s... 1970's French Strawberry Pie Recipe buzzsau "My mom used to make this for our family back ... 2 15M 10M 2H25M 4.5 9 19 8 4 10

Immediately we can see that we will need to correct the three time variables. They are currently saved as strings in an odd format, number-hours "H" number-minutes "M", and we will need to convert them to timestamp. We will use regex to identify the hours and minutes numbers then create a new column in the dataframe with the combined time.

In [13]:
def fix_time(ser):
    temp = ser.str.extract(r"(?:(\d+)[D][a][y][s]*)?(?:(\d+)[H])?(\d+)[M]?")
    temp.fillna(value=0, inplace=True)
    temp = temp.astype('int32')
    temp['final'] = pd.to_datetime(temp[0]*60*24 + temp[1]*60 + temp[2], unit='m').dt.strftime('%d:%H:%M')
    
    return temp['final']
    
In [14]:
recipes['prep_time'] = fix_time(recipes.prep_time)
recipes['cook_time'] = fix_time(recipes.cook_time)
recipes['total_time'] = fix_time(recipes.total_time)

There is also an issue with the made_it column; it contains both integer and string values. Any recipe that has been made more than 1,000 times does not have an integer value. Instead it is rounded to the nearest 1,000 and expressed in the form "1k". Let's correct this using regex.

In [63]:
recipes.made_it = recipes.made_it.astype('str').str.extract(r"(\d+)[k]*")
recipes.made_it = recipes.made_it.astype('int32')
In [60]:
recipes.head(1)
Out[60]:
id url title author description num_photos prep_time cook_time total_time rating reviews made_it servings num_steps num_ingredients
0 1 https://www.allrecipes.com/recipe/222980/1970s... 1970's French Strawberry Pie Recipe buzzsau "My mom used to make this for our family back ... 2 01:00:15 01:00:10 01:02:25 4.5 9 19 8 4 10

Recipe Exploratory Data Analysis

Now that the data is clean, we can start to explore the relationship between ingredients and ratings. We will continue to utilize Pandas for the heavy lifting, but we will also use Seaborn to visualize relationships.

First we will take a look at the average recipe rating by total number of ingredients.

In [16]:
%matplotlib inline
import seaborn as sb
In [24]:
data = recipes.groupby(by='num_ingredients').mean()['rating']

ax = sb.lineplot(data=data, color="#34495e", legend='full')
_ = ax.set_title('Rating (grey) and recipes (blue) by Number of Ingredients')
_ = ax.set_ylabel('Average Rating')
_ = ax.set_xlabel('Ingredients')

ax2 = ax.twinx()
ax2 = sb.lineplot(data=recipes.groupby(by='num_ingredients').count()['rating'])
_ = ax2.set_ylabel('Count Recipes')

The mean rating is surprisingly constant with respect to the number of ingredients, but it does slightly decrease as the number of ingredients increase. For recipes with >20 ingredients there is much more variability in the average rating.

This is likely because there are very few recipes with this many ingredients. To test this, we can recreate this graph using the variance of rating instead of the average.

In [25]:
data = recipes.groupby(by='num_ingredients').var()['rating']

ax = sb.lineplot(data=data, color="#34495e", legend='full')
_ = ax.set_title('Rating (grey) and recipes (blue) by Number of Ingredients')
_ = ax.set_ylabel('Rating Variance')
_ = ax.set_xlabel('Ingredients')

ax2 = ax.twinx()
ax2 = sb.lineplot(data=recipes.groupby(by='num_ingredients').count()['rating'])
_ = ax2.set_ylabel('Count Recipes')

Variance shows a positive correlation to the number of ingredients. This trend is present, even for recipes with <20 ingredients. Even though the average rating does not decrease by much, users are more polarized in the ratings that they provide.

Finally, let's check to see if there's a relationship between the number of ingredients and the amount of users who self-report that they cooked the recipe.

In [67]:
data = recipes.groupby(by='num_ingredients').mean()['made_it']

ax = sb.lineplot(data=data, color="#34495e", legend='full')
_ = ax.set_title('Made_it (grey) and recipes (blue) by Number of Ingredients')
_ = ax.set_ylabel('Average made_it')
_ = ax.set_xlabel('Ingredients')

ax2 = ax.twinx()
ax2 = sb.lineplot(data=recipes.groupby(by='num_ingredients').count()['rating'])
_ = ax2.set_ylabel('Count Recipes')

The more ingredients a recipe has, the less likely users are to attempt cooking it. This could be due to the complexity of the recipe or the fact that recipes with a high ingredient count are infrequent; users have a hard time finding them.

Ingredient Processing

To limit the scope of this tutorial, I decided to use a part of speech tagger specifically trained on Recipes by the NYTimes engineering team. I will briefly run through the usage of the model, but you can visit the github page yourself to learn more.

We will run the ingredient list through the tagger so that we can separate ingredients from measurements or other text. For example, if a recipe calls for "2 cups shredded mozzarella cheese" we would like to identify "2 cups" as a measurement, "shredded" as a verb describing the state of the ingredient and "mozzarella cheese" as the base ingredient. For this tutorial we will focus our analysis only on the base ingredients.

To start, we will follow their directions and train the model on a subset of their data using the default parameters. I will collapse the output since it prints quite a lot to console.

In [75]:
import os
os.chdir('ingredient-phrase-tagger/')
! ./roundtrip.sh
generating training data...
/anaconda3/lib/python3.6/re.py:212: FutureWarning: split() requires a non-empty pattern match.
  return _compile(pattern, flags).split(string, maxsplit)
generating test data...
/anaconda3/lib/python3.6/re.py:212: FutureWarning: split() requires a non-empty pattern match.
  return _compile(pattern, flags).split(string, maxsplit)
training...
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2013 Taku Kudo, All rights reserved.

reading training data: 100.. 200.. 300.. 400.. 500.. 600.. 700.. 800.. 900.. 1000.. 1100.. 1200.. 1300.. 1400.. 1500.. 1600.. 1700.. 1800.. 1900.. 2000.. 2100.. 2200.. 2300.. 2400.. 2500.. 2600.. 2700.. 2800.. 2900.. 3000.. 3100.. 3200.. 3300.. 3400.. 3500.. 3600.. 3700.. 3800.. 3900.. 4000.. 4100.. 4200.. 4300.. 4400.. 4500.. 4600.. 4700.. 4800.. 4900.. 5000.. 5100.. 5200.. 5300.. 5400.. 5500.. 5600.. 5700.. 5800.. 5900.. 6000.. 6100.. 6200.. 6300.. 6400.. 6500.. 6600.. 6700.. 6800.. 6900.. 7000.. 7100.. 7200.. 7300.. 7400.. 7500.. 7600.. 7700.. 7800.. 7900.. 8000.. 8100.. 8200.. 8300.. 8400.. 8500.. 8600.. 8700.. 8800.. 8900.. 9000.. 9100.. 9200.. 9300.. 9400.. 9500.. 9600.. 9700.. 9800.. 9900.. 10000.. 10100.. 10200.. 10300.. 10400.. 10500.. 10600.. 10700.. 10800.. 10900.. 11000.. 11100.. 11200.. 11300.. 11400.. 11500.. 11600.. 11700.. 11800.. 11900.. 12000.. 12100.. 12200.. 12300.. 12400.. 12500.. 12600.. 12700.. 12800.. 12900.. 13000.. 13100.. 13200.. 13300.. 13400.. 13500.. 13600.. 13700.. 13800.. 13900.. 14000.. 14100.. 14200.. 14300.. 14400.. 14500.. 14600.. 14700.. 14800.. 14900.. 15000.. 15100.. 15200.. 15300.. 15400.. 15500.. 15600.. 15700.. 15800.. 15900.. 16000.. 16100.. 16200.. 16300.. 16400.. 16500.. 16600.. 16700.. 16800.. 16900.. 17000.. 17100.. 17200.. 17300.. 17400.. 17500.. 17600.. 17700.. 17800.. 17900.. 18000.. 18100.. 18200.. 18300.. 18400.. 18500.. 18600.. 18700.. 18800.. 18900.. 19000.. 19100.. 19200.. 19300.. 19400.. 19500.. 19600.. 19700.. 19800.. 19900.. 
Done!1.15 s

Number of sentences: 19988
Number of features:  379656
Number of thread(s): 4
Freq:                1
eta:                 0.00010
C:                   1.00000
shrinking size:      20
iter=0 terr=0.86627 serr=1.00000 act=379656 obj=256271.09135 diff=1.00000
iter=1 terr=0.49386 serr=0.89459 act=379656 obj=190324.57813 diff=0.25733
iter=2 terr=0.36968 serr=0.72959 act=379656 obj=130463.89461 diff=0.31452
iter=3 terr=0.39388 serr=0.73654 act=379656 obj=112646.04491 diff=0.13657
iter=4 terr=0.36223 serr=0.67796 act=379656 obj=99285.49717 diff=0.11861
iter=5 terr=0.35032 serr=0.66940 act=379656 obj=92961.75423 diff=0.06369
iter=6 terr=0.28854 serr=0.54888 act=379656 obj=80966.50267 diff=0.12903
iter=7 terr=0.28796 serr=0.58150 act=379656 obj=76687.36260 diff=0.05285
iter=8 terr=0.25126 serr=0.50640 act=379656 obj=71131.74179 diff=0.07245
iter=9 terr=0.23925 serr=0.48274 act=379656 obj=68666.11719 diff=0.03466
iter=10 terr=0.23135 serr=0.46118 act=379656 obj=65141.23047 diff=0.05133
iter=11 terr=0.21596 serr=0.42080 act=379656 obj=59607.10174 diff=0.08496
iter=12 terr=0.21150 serr=0.41555 act=379656 obj=53396.66674 diff=0.10419
iter=13 terr=0.19457 serr=0.38608 act=379656 obj=50895.28705 diff=0.04685
iter=14 terr=0.19556 serr=0.36477 act=379656 obj=46884.08972 diff=0.07881
iter=15 terr=0.18237 serr=0.35541 act=379656 obj=45714.32584 diff=0.02495
iter=16 terr=0.17846 serr=0.35146 act=379656 obj=44809.05480 diff=0.01980
iter=17 terr=0.17984 serr=0.35872 act=379656 obj=42831.79690 diff=0.04413
iter=18 terr=0.16320 serr=0.33055 act=379656 obj=40938.97754 diff=0.04419
iter=19 terr=0.15814 serr=0.32099 act=379656 obj=40219.15547 diff=0.01758
iter=20 terr=0.15772 serr=0.31349 act=379656 obj=39537.19529 diff=0.01696
iter=21 terr=0.15171 serr=0.30603 act=379656 obj=38646.57164 diff=0.02253
iter=22 terr=0.15622 serr=0.31259 act=379656 obj=37954.50756 diff=0.01791
iter=23 terr=0.15241 serr=0.30879 act=379656 obj=37649.67647 diff=0.00803
iter=24 terr=0.14694 serr=0.30158 act=379656 obj=36733.44333 diff=0.02434
iter=25 terr=0.14999 serr=0.31014 act=379656 obj=35692.66977 diff=0.02833
iter=26 terr=0.14284 serr=0.29498 act=379656 obj=35215.84993 diff=0.01336
iter=27 terr=0.14153 serr=0.29132 act=379656 obj=34551.46109 diff=0.01887
iter=28 terr=0.14398 serr=0.29268 act=379656 obj=34191.89483 diff=0.01041
iter=29 terr=0.14109 serr=0.28782 act=379656 obj=33883.77409 diff=0.00901
iter=30 terr=0.13904 serr=0.28492 act=379656 obj=33614.81387 diff=0.00794
iter=31 terr=0.13589 serr=0.28177 act=379656 obj=32920.37115 diff=0.02066
iter=32 terr=0.13281 serr=0.27767 act=379656 obj=32436.45167 diff=0.01470
iter=33 terr=0.13090 serr=0.27436 act=379656 obj=31837.06328 diff=0.01848
iter=34 terr=0.13243 serr=0.27617 act=379656 obj=31532.35360 diff=0.00957
iter=35 terr=0.13103 serr=0.27537 act=379656 obj=31233.59876 diff=0.00947
iter=36 terr=0.17521 serr=0.29988 act=379656 obj=32025.94612 diff=0.02537
iter=37 terr=0.13630 serr=0.28012 act=379656 obj=31031.51782 diff=0.03105
iter=38 terr=0.13239 serr=0.27772 act=379656 obj=30664.11887 diff=0.01184
iter=39 terr=0.13050 serr=0.27361 act=379656 obj=30323.49130 diff=0.01111
iter=40 terr=0.12772 serr=0.26816 act=379656 obj=29808.98252 diff=0.01697
iter=41 terr=0.12482 serr=0.26286 act=379656 obj=29393.40172 diff=0.01394
iter=42 terr=0.12411 serr=0.25956 act=379656 obj=29136.14645 diff=0.00875
iter=43 terr=0.12544 serr=0.26011 act=379656 obj=28781.79104 diff=0.01216
iter=44 terr=0.12056 serr=0.25385 act=379656 obj=28560.33595 diff=0.00769
iter=45 terr=0.12092 serr=0.25415 act=379656 obj=28312.41578 diff=0.00868
iter=46 terr=0.11956 serr=0.25275 act=379656 obj=27945.14301 diff=0.01297
iter=47 terr=0.12194 serr=0.25490 act=379656 obj=27781.00096 diff=0.00587
iter=48 terr=0.11703 serr=0.25085 act=379656 obj=27489.94795 diff=0.01048
iter=49 terr=0.11546 serr=0.24960 act=379656 obj=27384.61989 diff=0.00383
iter=50 terr=0.11289 serr=0.24570 act=379656 obj=27139.30976 diff=0.00896
iter=51 terr=0.11261 serr=0.24630 act=379656 obj=26873.01013 diff=0.00981
iter=52 terr=0.10909 serr=0.24220 act=379656 obj=26666.62675 diff=0.00768
iter=53 terr=0.11045 serr=0.24210 act=379656 obj=26374.30591 diff=0.01096
iter=54 terr=0.11081 serr=0.24074 act=379656 obj=26249.76191 diff=0.00472
iter=55 terr=0.10854 serr=0.23624 act=379656 obj=26057.05116 diff=0.00734
iter=56 terr=0.11131 serr=0.23929 act=379656 obj=25852.52120 diff=0.00785
iter=57 terr=0.10969 serr=0.23784 act=379656 obj=25690.46048 diff=0.00627
iter=58 terr=0.10575 serr=0.23059 act=379656 obj=25236.32042 diff=0.01768
iter=59 terr=0.10528 serr=0.22874 act=379656 obj=24874.34510 diff=0.01434
iter=60 terr=0.10176 serr=0.22614 act=379656 obj=24799.78697 diff=0.00300
iter=61 terr=0.10297 serr=0.22529 act=379656 obj=24533.35192 diff=0.01074
iter=62 terr=0.10338 serr=0.22514 act=379656 obj=24456.77894 diff=0.00312
iter=63 terr=0.10271 serr=0.22433 act=379656 obj=24317.76380 diff=0.00568
iter=64 terr=0.11275 serr=0.23169 act=379656 obj=24510.69640 diff=0.00793
iter=65 terr=0.10432 serr=0.22579 act=379656 obj=24266.97954 diff=0.00994
iter=66 terr=0.10284 serr=0.22438 act=379656 obj=24170.13118 diff=0.00399
iter=67 terr=0.10198 serr=0.22328 act=379656 obj=24065.15252 diff=0.00434
iter=68 terr=0.10097 serr=0.22168 act=379656 obj=23946.65297 diff=0.00492
iter=69 terr=0.10047 serr=0.22083 act=379656 obj=23829.45300 diff=0.00489
iter=70 terr=0.09841 serr=0.21778 act=379656 obj=23655.31311 diff=0.00731
iter=71 terr=0.09789 serr=0.21653 act=379656 obj=23535.38641 diff=0.00507
iter=72 terr=0.09896 serr=0.21838 act=379656 obj=23422.03924 diff=0.00482
iter=73 terr=0.09857 serr=0.21868 act=379656 obj=23354.89159 diff=0.00287
iter=74 terr=0.09790 serr=0.21833 act=379656 obj=23218.07154 diff=0.00586
iter=75 terr=0.09598 serr=0.21808 act=379656 obj=23220.74747 diff=0.00012
iter=76 terr=0.09601 serr=0.21578 act=379656 obj=23079.74167 diff=0.00607
iter=77 terr=0.09479 serr=0.21393 act=379656 obj=22910.49500 diff=0.00733
iter=78 terr=0.09411 serr=0.21263 act=379656 obj=22819.86431 diff=0.00396
iter=79 terr=0.09435 serr=0.21298 act=379656 obj=22707.11369 diff=0.00494
iter=80 terr=0.09303 serr=0.20963 act=379656 obj=22619.10217 diff=0.00388
iter=81 terr=0.09226 serr=0.20868 act=379656 obj=22551.63634 diff=0.00298
iter=82 terr=0.09207 serr=0.20782 act=379656 obj=22417.71163 diff=0.00594
iter=83 terr=0.09252 serr=0.20822 act=379656 obj=22337.87026 diff=0.00356
iter=84 terr=0.09080 serr=0.20737 act=379656 obj=22227.56187 diff=0.00494
iter=85 terr=0.09036 serr=0.20527 act=379656 obj=22093.62587 diff=0.00603
iter=86 terr=0.09019 serr=0.20582 act=379656 obj=22013.45109 diff=0.00363
iter=87 terr=0.09035 serr=0.20632 act=379656 obj=21965.58767 diff=0.00217
iter=88 terr=0.08971 serr=0.20637 act=379656 obj=21899.44286 diff=0.00301
iter=89 terr=0.08904 serr=0.20517 act=379656 obj=21824.42000 diff=0.00343
iter=90 terr=0.08827 serr=0.20402 act=379656 obj=21715.51242 diff=0.00499
iter=91 terr=0.08725 serr=0.20242 act=379656 obj=21611.58989 diff=0.00479
iter=92 terr=0.08726 serr=0.20322 act=379656 obj=21523.12949 diff=0.00409
iter=93 terr=0.08711 serr=0.20202 act=379656 obj=21444.21432 diff=0.00367
iter=94 terr=0.08699 serr=0.20112 act=379656 obj=21375.19089 diff=0.00322
iter=95 terr=0.08560 serr=0.19942 act=379656 obj=21336.29818 diff=0.00182
iter=96 terr=0.08600 serr=0.20032 act=379656 obj=21273.91234 diff=0.00292
iter=97 terr=0.08557 serr=0.19997 act=379656 obj=21219.11107 diff=0.00258
iter=98 terr=0.08504 serr=0.19847 act=379656 obj=21171.78273 diff=0.00223
iter=99 terr=0.08382 serr=0.19672 act=379656 obj=21089.41272 diff=0.00389
iter=100 terr=0.08275 serr=0.19522 act=379656 obj=21001.50852 diff=0.00417
iter=101 terr=0.08279 serr=0.19512 act=379656 obj=20908.07453 diff=0.00445
iter=102 terr=0.08217 serr=0.19422 act=379656 obj=20829.08305 diff=0.00378
iter=103 terr=0.08199 serr=0.19382 act=379656 obj=20775.71906 diff=0.00256
iter=104 terr=0.08174 serr=0.19342 act=379656 obj=20733.00878 diff=0.00206
iter=105 terr=0.08145 serr=0.19347 act=379656 obj=20698.11776 diff=0.00168
iter=106 terr=0.08071 serr=0.19342 act=379656 obj=20623.56955 diff=0.00360
iter=107 terr=0.08029 serr=0.19322 act=379656 obj=20570.67742 diff=0.00256
iter=108 terr=0.07907 serr=0.19237 act=379656 obj=20493.18619 diff=0.00377
iter=109 terr=0.07951 serr=0.19177 act=379656 obj=20450.92431 diff=0.00206
iter=110 terr=0.07878 serr=0.19136 act=379656 obj=20411.90640 diff=0.00191
iter=111 terr=0.07826 serr=0.19056 act=379656 obj=20398.88115 diff=0.00064
iter=112 terr=0.07824 serr=0.19026 act=379656 obj=20366.20252 diff=0.00160
iter=113 terr=0.07807 serr=0.19001 act=379656 obj=20316.93189 diff=0.00242
iter=114 terr=0.07789 serr=0.19076 act=379656 obj=20263.01005 diff=0.00265
iter=115 terr=0.07762 serr=0.18956 act=379656 obj=20217.45473 diff=0.00225
iter=116 terr=0.07726 serr=0.18871 act=379656 obj=20157.46290 diff=0.00297
iter=117 terr=0.07669 serr=0.18766 act=379656 obj=20120.97388 diff=0.00181
iter=118 terr=0.07617 serr=0.18746 act=379656 obj=20069.24137 diff=0.00257
iter=119 terr=0.07625 serr=0.18851 act=379656 obj=20018.27808 diff=0.00254
iter=120 terr=0.07510 serr=0.18681 act=379656 obj=19953.10133 diff=0.00326
iter=121 terr=0.07485 serr=0.18921 act=379656 obj=19963.43500 diff=0.00052
iter=122 terr=0.07495 serr=0.18876 act=379656 obj=19916.86272 diff=0.00233
iter=123 terr=0.07488 serr=0.18776 act=379656 obj=19867.32271 diff=0.00249
iter=124 terr=0.07500 serr=0.18721 act=379656 obj=19824.07679 diff=0.00218
iter=125 terr=0.07410 serr=0.18486 act=379656 obj=19770.18396 diff=0.00272
iter=126 terr=0.07434 serr=0.18496 act=379656 obj=19732.65203 diff=0.00190
iter=127 terr=0.07379 serr=0.18446 act=379656 obj=19695.91690 diff=0.00186
iter=128 terr=0.07271 serr=0.18331 act=379656 obj=19656.24055 diff=0.00201
iter=129 terr=0.07256 serr=0.18256 act=379656 obj=19624.37353 diff=0.00162
iter=130 terr=0.07217 serr=0.18251 act=379656 obj=19576.73218 diff=0.00243
iter=131 terr=0.07217 serr=0.18226 act=379656 obj=19537.24810 diff=0.00202
iter=132 terr=0.07235 serr=0.18236 act=379656 obj=19512.40000 diff=0.00127
iter=133 terr=0.07213 serr=0.18176 act=379656 obj=19477.34976 diff=0.00180
iter=134 terr=0.07373 serr=0.18171 act=379656 obj=19497.01068 diff=0.00101
iter=135 terr=0.07259 serr=0.18131 act=379656 obj=19452.38181 diff=0.00229
iter=136 terr=0.07223 serr=0.18086 act=379656 obj=19419.88799 diff=0.00167
iter=137 terr=0.07148 serr=0.17956 act=379656 obj=19374.90261 diff=0.00232
iter=138 terr=0.07121 serr=0.17941 act=379656 obj=19345.44049 diff=0.00152
iter=139 terr=0.07091 serr=0.18046 act=379656 obj=19326.33853 diff=0.00099
iter=140 terr=0.07068 serr=0.18011 act=379656 obj=19293.28978 diff=0.00171
iter=141 terr=0.07058 serr=0.17976 act=379656 obj=19280.47951 diff=0.00066
iter=142 terr=0.07027 serr=0.17856 act=379656 obj=19252.76910 diff=0.00144
iter=143 terr=0.07167 serr=0.17836 act=379656 obj=19298.34884 diff=0.00237
iter=144 terr=0.07040 serr=0.17811 act=379656 obj=19237.64213 diff=0.00315
iter=145 terr=0.07015 serr=0.17746 act=379656 obj=19213.66444 diff=0.00125
iter=146 terr=0.07013 serr=0.17696 act=379656 obj=19172.64782 diff=0.00213
iter=147 terr=0.06977 serr=0.17646 act=379656 obj=19138.48475 diff=0.00178
iter=148 terr=0.06940 serr=0.17581 act=379656 obj=19094.17793 diff=0.00232
iter=149 terr=0.06893 serr=0.17646 act=379656 obj=19140.60269 diff=0.00243
iter=150 terr=0.06892 serr=0.17571 act=379656 obj=19071.74083 diff=0.00360
iter=151 terr=0.06896 serr=0.17586 act=379656 obj=19042.75237 diff=0.00152
iter=152 terr=0.06893 serr=0.17626 act=379656 obj=19017.46188 diff=0.00133
iter=153 terr=0.06880 serr=0.17596 act=379656 obj=18989.99442 diff=0.00144
iter=154 terr=0.06837 serr=0.17485 act=379656 obj=18959.72270 diff=0.00159
iter=155 terr=0.06766 serr=0.17375 act=379656 obj=18906.61315 diff=0.00280
iter=156 terr=0.06747 serr=0.17430 act=379656 obj=18862.42620 diff=0.00234
iter=157 terr=0.06754 serr=0.17345 act=379656 obj=18828.71924 diff=0.00179
iter=158 terr=0.06777 serr=0.17470 act=379656 obj=18800.98147 diff=0.00147
iter=159 terr=0.06740 serr=0.17390 act=379656 obj=18768.87417 diff=0.00171
iter=160 terr=0.06712 serr=0.17245 act=379656 obj=18729.13227 diff=0.00212
iter=161 terr=0.06689 serr=0.17165 act=379656 obj=18698.12292 diff=0.00166
iter=162 terr=0.06631 serr=0.17140 act=379656 obj=18681.99206 diff=0.00086
iter=163 terr=0.06588 serr=0.16975 act=379656 obj=18647.63086 diff=0.00184
iter=164 terr=0.06582 serr=0.16985 act=379656 obj=18626.84244 diff=0.00111
iter=165 terr=0.06478 serr=0.16910 act=379656 obj=18616.42369 diff=0.00056
iter=166 terr=0.06521 serr=0.16925 act=379656 obj=18570.05673 diff=0.00249
iter=167 terr=0.06514 serr=0.16900 act=379656 obj=18559.57437 diff=0.00056
iter=168 terr=0.06543 serr=0.16955 act=379656 obj=18545.45976 diff=0.00076
iter=169 terr=0.06567 serr=0.16935 act=379656 obj=18534.04077 diff=0.00062
iter=170 terr=0.06537 serr=0.16830 act=379656 obj=18510.20823 diff=0.00129
iter=171 terr=0.06489 serr=0.16820 act=379656 obj=18491.25005 diff=0.00102
iter=172 terr=0.06445 serr=0.16805 act=379656 obj=18480.91032 diff=0.00056
iter=173 terr=0.06472 serr=0.16790 act=379656 obj=18456.03388 diff=0.00135
iter=174 terr=0.06406 serr=0.16700 act=379656 obj=18444.59021 diff=0.00062
iter=175 terr=0.06402 serr=0.16635 act=379656 obj=18411.71642 diff=0.00178
iter=176 terr=0.06404 serr=0.16640 act=379656 obj=18401.32366 diff=0.00056
iter=177 terr=0.06398 serr=0.16650 act=379656 obj=18385.50723 diff=0.00086
iter=178 terr=0.06400 serr=0.16590 act=379656 obj=18377.35468 diff=0.00044
iter=179 terr=0.06340 serr=0.16545 act=379656 obj=18352.93975 diff=0.00133
iter=180 terr=0.06319 serr=0.16535 act=379656 obj=18342.94061 diff=0.00054
iter=181 terr=0.06303 serr=0.16475 act=379656 obj=18332.08313 diff=0.00059
iter=182 terr=0.06287 serr=0.16475 act=379656 obj=18311.94710 diff=0.00110
iter=183 terr=0.06235 serr=0.16475 act=379656 obj=18297.75780 diff=0.00077
iter=184 terr=0.06232 serr=0.16465 act=379656 obj=18259.87696 diff=0.00207
iter=185 terr=0.06243 serr=0.16470 act=379656 obj=18248.54452 diff=0.00062
iter=186 terr=0.06231 serr=0.16465 act=379656 obj=18236.71171 diff=0.00065
iter=187 terr=0.06249 serr=0.16490 act=379656 obj=18228.59379 diff=0.00045
iter=188 terr=0.06212 serr=0.16570 act=379656 obj=18210.02588 diff=0.00102
iter=189 terr=0.06205 serr=0.16515 act=379656 obj=18194.20858 diff=0.00087
iter=190 terr=0.06157 serr=0.16435 act=379656 obj=18183.37755 diff=0.00060
iter=191 terr=0.06123 serr=0.16375 act=379656 obj=18160.24369 diff=0.00127
iter=192 terr=0.06080 serr=0.16190 act=379656 obj=18124.70772 diff=0.00196
iter=193 terr=0.06119 serr=0.16220 act=379656 obj=18112.89173 diff=0.00065
iter=194 terr=0.06077 serr=0.16135 act=379656 obj=18093.29163 diff=0.00108
iter=195 terr=0.06036 serr=0.16080 act=379656 obj=18082.76761 diff=0.00058
iter=196 terr=0.06064 serr=0.16100 act=379656 obj=18068.30090 diff=0.00080
iter=197 terr=0.06018 serr=0.16055 act=379656 obj=18052.13249 diff=0.00089
iter=198 terr=0.06001 serr=0.16000 act=379656 obj=18040.74454 diff=0.00063
iter=199 terr=0.05991 serr=0.15980 act=379656 obj=18022.91568 diff=0.00099
iter=200 terr=0.05984 serr=0.15930 act=379656 obj=18009.59095 diff=0.00074
iter=201 terr=0.05988 serr=0.15940 act=379656 obj=17995.67329 diff=0.00077
iter=202 terr=0.05967 serr=0.15880 act=379656 obj=17983.39834 diff=0.00068
iter=203 terr=0.05964 serr=0.15845 act=379656 obj=17973.97910 diff=0.00052
iter=204 terr=0.05991 serr=0.15920 act=379656 obj=17965.14044 diff=0.00049
iter=205 terr=0.05932 serr=0.15855 act=379656 obj=17948.86461 diff=0.00091
iter=206 terr=0.05916 serr=0.15794 act=379656 obj=17935.74142 diff=0.00073
iter=207 terr=0.05902 serr=0.15769 act=379656 obj=17925.54657 diff=0.00057
iter=208 terr=0.05868 serr=0.15714 act=379656 obj=17915.97482 diff=0.00053
iter=209 terr=0.05883 serr=0.15764 act=379656 obj=17907.81597 diff=0.00046
iter=210 terr=0.05854 serr=0.15699 act=379656 obj=17899.33039 diff=0.00047
iter=211 terr=0.05856 serr=0.15734 act=379656 obj=17890.91706 diff=0.00047
iter=212 terr=0.05855 serr=0.15729 act=379656 obj=17884.48607 diff=0.00036
iter=213 terr=0.05816 serr=0.15709 act=379656 obj=17875.89563 diff=0.00048
iter=214 terr=0.05823 serr=0.15649 act=379656 obj=17868.83744 diff=0.00039
iter=215 terr=0.05816 serr=0.15614 act=379656 obj=17864.95650 diff=0.00022
iter=216 terr=0.05815 serr=0.15609 act=379656 obj=17858.47682 diff=0.00036
iter=217 terr=0.05822 serr=0.15634 act=379656 obj=17854.25364 diff=0.00024
iter=218 terr=0.05791 serr=0.15539 act=379656 obj=17842.48607 diff=0.00066
iter=219 terr=0.05794 serr=0.15564 act=379656 obj=17838.17838 diff=0.00024
iter=220 terr=0.05790 serr=0.15559 act=379656 obj=17833.06577 diff=0.00029
iter=221 terr=0.05788 serr=0.15574 act=379656 obj=17824.22970 diff=0.00050
iter=222 terr=0.05802 serr=0.15509 act=379656 obj=17818.08037 diff=0.00034
iter=223 terr=0.05777 serr=0.15514 act=379656 obj=17808.12708 diff=0.00056
iter=224 terr=0.05768 serr=0.15494 act=379656 obj=17803.72067 diff=0.00025
iter=225 terr=0.05718 serr=0.15349 act=379656 obj=17793.86232 diff=0.00055
iter=226 terr=0.05729 serr=0.15424 act=379656 obj=17786.48030 diff=0.00041
iter=227 terr=0.05737 serr=0.15469 act=379656 obj=17778.78596 diff=0.00043
iter=228 terr=0.05751 serr=0.15464 act=379656 obj=17766.63752 diff=0.00068
iter=229 terr=0.05733 serr=0.15439 act=379656 obj=17764.31608 diff=0.00013
iter=230 terr=0.05749 serr=0.15419 act=379656 obj=17757.43321 diff=0.00039
iter=231 terr=0.05742 serr=0.15429 act=379656 obj=17753.39065 diff=0.00023
iter=232 terr=0.05759 serr=0.15489 act=379656 obj=17743.92706 diff=0.00053
iter=233 terr=0.05768 serr=0.15599 act=379656 obj=17739.24282 diff=0.00026
iter=234 terr=0.05783 serr=0.15629 act=379656 obj=17732.90009 diff=0.00036
iter=235 terr=0.05744 serr=0.15579 act=379656 obj=17728.07338 diff=0.00027
iter=236 terr=0.05762 serr=0.15624 act=379656 obj=17724.31723 diff=0.00021
iter=237 terr=0.05742 serr=0.15559 act=379656 obj=17719.50996 diff=0.00027
iter=238 terr=0.05744 serr=0.15559 act=379656 obj=17713.89977 diff=0.00032
iter=239 terr=0.05745 serr=0.15519 act=379656 obj=17708.37659 diff=0.00031
iter=240 terr=0.05750 serr=0.15494 act=379656 obj=17703.33977 diff=0.00028
iter=241 terr=0.05750 serr=0.15464 act=379656 obj=17697.34438 diff=0.00034
iter=242 terr=0.05729 serr=0.15439 act=379656 obj=17690.85186 diff=0.00037
iter=243 terr=0.05719 serr=0.15429 act=379656 obj=17678.68976 diff=0.00069
iter=244 terr=0.05737 serr=0.15379 act=379656 obj=17679.06818 diff=0.00002
iter=245 terr=0.05724 serr=0.15399 act=379656 obj=17675.49686 diff=0.00020
iter=246 terr=0.05728 serr=0.15509 act=379656 obj=17671.09863 diff=0.00025
iter=247 terr=0.05704 serr=0.15429 act=379656 obj=17667.52564 diff=0.00020
iter=248 terr=0.05702 serr=0.15409 act=379656 obj=17664.90488 diff=0.00015
iter=249 terr=0.05696 serr=0.15394 act=379656 obj=17662.27560 diff=0.00015
iter=250 terr=0.05687 serr=0.15394 act=379656 obj=17660.08508 diff=0.00012
iter=251 terr=0.05689 serr=0.15379 act=379656 obj=17657.03748 diff=0.00017
iter=252 terr=0.05674 serr=0.15334 act=379656 obj=17649.72561 diff=0.00041
iter=253 terr=0.05738 serr=0.15364 act=379656 obj=17691.51237 diff=0.00237
iter=254 terr=0.05691 serr=0.15349 act=379656 obj=17648.08842 diff=0.00245
iter=255 terr=0.05667 serr=0.15299 act=379656 obj=17643.32449 diff=0.00027
iter=256 terr=0.05648 serr=0.15274 act=379656 obj=17639.22525 diff=0.00023
iter=257 terr=0.05655 serr=0.15324 act=379656 obj=17637.38206 diff=0.00010
iter=258 terr=0.05659 serr=0.15309 act=379656 obj=17631.93247 diff=0.00031
iter=259 terr=0.05653 serr=0.15299 act=379656 obj=17628.84058 diff=0.00018
iter=260 terr=0.05649 serr=0.15284 act=379656 obj=17625.16838 diff=0.00021
iter=261 terr=0.05644 serr=0.15224 act=379656 obj=17619.50219 diff=0.00032
iter=262 terr=0.05618 serr=0.15199 act=379656 obj=17615.03425 diff=0.00025
iter=263 terr=0.05633 serr=0.15184 act=379656 obj=17609.40053 diff=0.00032
iter=264 terr=0.05616 serr=0.15144 act=379656 obj=17605.34426 diff=0.00023
iter=265 terr=0.05605 serr=0.15129 act=379656 obj=17603.10940 diff=0.00013
iter=266 terr=0.05606 serr=0.15129 act=379656 obj=17599.41339 diff=0.00021
iter=267 terr=0.05606 serr=0.15164 act=379656 obj=17595.04960 diff=0.00025
iter=268 terr=0.05600 serr=0.15154 act=379656 obj=17585.76269 diff=0.00053
iter=269 terr=0.05570 serr=0.15174 act=379656 obj=17589.75033 diff=0.00023
iter=270 terr=0.05563 serr=0.15114 act=379656 obj=17581.56735 diff=0.00047
iter=271 terr=0.05575 serr=0.15119 act=379656 obj=17575.06856 diff=0.00037
iter=272 terr=0.05571 serr=0.15124 act=379656 obj=17571.24955 diff=0.00022
iter=273 terr=0.05586 serr=0.15139 act=379656 obj=17568.71439 diff=0.00014
iter=274 terr=0.05579 serr=0.15119 act=379656 obj=17565.24287 diff=0.00020
iter=275 terr=0.05573 serr=0.15134 act=379656 obj=17559.61347 diff=0.00032
iter=276 terr=0.05573 serr=0.15104 act=379656 obj=17555.96796 diff=0.00021
iter=277 terr=0.05593 serr=0.15119 act=379656 obj=17550.61955 diff=0.00030
iter=278 terr=0.05550 serr=0.15059 act=379656 obj=17553.62536 diff=0.00017
iter=279 terr=0.05558 serr=0.15044 act=379656 obj=17547.98191 diff=0.00032
iter=280 terr=0.05564 serr=0.15049 act=379656 obj=17545.54389 diff=0.00014
iter=281 terr=0.05553 serr=0.15059 act=379656 obj=17541.73762 diff=0.00022
iter=282 terr=0.05590 serr=0.15089 act=379656 obj=17541.24970 diff=0.00003
iter=283 terr=0.05562 serr=0.15064 act=379656 obj=17537.42971 diff=0.00022
iter=284 terr=0.05576 serr=0.15119 act=379656 obj=17535.41044 diff=0.00012
iter=285 terr=0.05560 serr=0.15079 act=379656 obj=17533.83764 diff=0.00009
iter=286 terr=0.05552 serr=0.15074 act=379656 obj=17531.60960 diff=0.00013
iter=287 terr=0.05546 serr=0.15054 act=379656 obj=17529.84260 diff=0.00010
iter=288 terr=0.05546 serr=0.15029 act=379656 obj=17525.40479 diff=0.00025
iter=289 terr=0.05555 serr=0.15049 act=379656 obj=17523.10929 diff=0.00013
iter=290 terr=0.05544 serr=0.15009 act=379656 obj=17519.51887 diff=0.00020
iter=291 terr=0.05499 serr=0.14939 act=379656 obj=17517.44082 diff=0.00012
iter=292 terr=0.05505 serr=0.14944 act=379656 obj=17514.28965 diff=0.00018
iter=293 terr=0.05493 serr=0.14929 act=379656 obj=17512.77262 diff=0.00009
iter=294 terr=0.05495 serr=0.14934 act=379656 obj=17511.59780 diff=0.00007
iter=295 terr=0.05499 serr=0.14949 act=379656 obj=17508.90846 diff=0.00015
iter=296 terr=0.05511 serr=0.14924 act=379656 obj=17506.35585 diff=0.00015
iter=297 terr=0.05507 serr=0.14924 act=379656 obj=17502.22322 diff=0.00024
iter=298 terr=0.05515 serr=0.14944 act=379656 obj=17500.51735 diff=0.00010
iter=299 terr=0.05512 serr=0.14934 act=379656 obj=17497.62977 diff=0.00016
iter=300 terr=0.05501 serr=0.14904 act=379656 obj=17509.16536 diff=0.00066
iter=301 terr=0.05522 serr=0.14949 act=379656 obj=17496.17704 diff=0.00074
iter=302 terr=0.05515 serr=0.14934 act=379656 obj=17493.80529 diff=0.00014
iter=303 terr=0.05485 serr=0.14849 act=379656 obj=17488.69163 diff=0.00029
iter=304 terr=0.05480 serr=0.14844 act=379656 obj=17485.64111 diff=0.00017
iter=305 terr=0.05453 serr=0.14854 act=379656 obj=17484.22592 diff=0.00008
iter=306 terr=0.05468 serr=0.14859 act=379656 obj=17481.01141 diff=0.00018
iter=307 terr=0.05469 serr=0.14859 act=379656 obj=17479.41273 diff=0.00009
iter=308 terr=0.05468 serr=0.14839 act=379656 obj=17477.39658 diff=0.00012
iter=309 terr=0.05439 serr=0.14834 act=379656 obj=17479.99241 diff=0.00015
iter=310 terr=0.05486 serr=0.14899 act=379656 obj=17476.30892 diff=0.00021
iter=311 terr=0.05481 serr=0.14894 act=379656 obj=17474.22182 diff=0.00012
iter=312 terr=0.05461 serr=0.14834 act=379656 obj=17470.72991 diff=0.00020
iter=313 terr=0.05429 serr=0.14784 act=379656 obj=17467.58408 diff=0.00018
iter=314 terr=0.05404 serr=0.14739 act=379656 obj=17464.44835 diff=0.00018
iter=315 terr=0.05415 serr=0.14719 act=379656 obj=17462.00774 diff=0.00014
iter=316 terr=0.05403 serr=0.14694 act=379656 obj=17459.81849 diff=0.00013
iter=317 terr=0.05384 serr=0.14669 act=379656 obj=17457.86168 diff=0.00011
iter=318 terr=0.05394 serr=0.14699 act=379656 obj=17457.23037 diff=0.00004
iter=319 terr=0.05393 serr=0.14689 act=379656 obj=17454.09438 diff=0.00018
iter=320 terr=0.05388 serr=0.14694 act=379656 obj=17452.29448 diff=0.00010
iter=321 terr=0.05391 serr=0.14744 act=379656 obj=17451.29050 diff=0.00006
iter=322 terr=0.05392 serr=0.14724 act=379656 obj=17449.54578 diff=0.00010
iter=323 terr=0.05395 serr=0.14719 act=379656 obj=17448.09093 diff=0.00008

Done!345.75 s

testing...
visualizing...
evaluating...

Sentence-Level Stats:
	correct:  1490
	total:  2000
	% correct:  74.5

Word-Level Stats:
	correct: 10404
	total: 11459
	% correct: 90.79326293742909

Now that the model is trained, we can use it on our dataset. We will once again use the tools provided. We will need to fetch all of the recipe ingredients and write them to a text file to use as input.

In [82]:
sql = 'SELECT * FROM ingredients;'
ingred = pd.read_sql(sql=sql, con=conn)
ingred.ingredient.to_csv("ingredients_raw.csv", index=False)
In [78]:
! python bin/parse-ingredients.py ingredients_raw.csv > results.txt
! python bin/convert-to-json.py results.txt > results.json
/anaconda3/lib/python3.6/re.py:212: FutureWarning: split() requires a non-empty pattern match.
  return _compile(pattern, flags).split(string, maxsplit)

We will read the JSON file into pandas and then perform a merge to combine it with the ingredients data we already have. Here we are merging on indexes because we know that the data has not been reordered.

In [90]:
ingred_nyt = pd.read_json('results.json')
ingred_total = ingred.merge(ingred_nyt, how='left', left_index=True, right_index=True)

Because we're only interested in the basic ingredient, we will reduce the columns in the dataset to the basic ingredient name and recipe_id. We'll need to drop duplicates because the same basic ingredient can appear multiple times in the same recipe. We don't have to look farther than the first 5 ingredients to see why...

HINT: strawberries

In [95]:
ingred_total.head()
Out[95]:
id recipe_id ingredient comment display input name other qty range_end unit
0 1 1 1 (4 ounce) package cream cheese, softened "1 () package, softened" <span class='comment'>"1 (</span><span class='... "1 (4 ounce) package cream cheese, softened" cream cheese NaN 4 NaN ounce
1 2 1 1 tablespoon heavy whipping cream NaN <span class='qty'>1</span><span class='unit'>t... 1 tablespoon heavy whipping cream heavy whipping cream NaN 1 NaN tablespoon
2 3 1 1 (9 inch) prepared shortbread pie crust (such... (9 inch) prepared shortbread crust (such as Ke... <span class='qty'>1</span><span class='comment... 1 (9 inch) prepared shortbread pie crust (such... pie NaN 1 NaN NaN
3 4 1 2 cups fresh strawberries, quartered fresh, quartered" <span class='qty'>"2</span><span class='unit'>... "2 cups fresh strawberries, quartered" strawberries NaN "2 NaN cup
4 5 1 2 cups fresh strawberries, mashed fresh, mashed" <span class='qty'>"2</span><span class='unit'>... "2 cups fresh strawberries, mashed" strawberries NaN "2 NaN cup
In [126]:
ingred_dedup = ingred_total[['recipe_id','name']].drop_duplicates()
ingred_dedup.dropna(inplace=True)

Ingredient Analysis

To get started, we will add a new column to the dataframe to summarize the frequency of each ingredient in the entire dataset.

In [127]:
ingred_dedup['freq'] = ingred_dedup.groupby('name')['name'].transform('count')
In [128]:
ingred_dedup.freq.describe()
Out[128]:
count    567749.000000
mean       5176.520499
std        6275.449382
min           1.000000
25%         387.000000
50%        2385.000000
75%        9780.000000
max       23092.000000
Name: freq, dtype: float64

It's refreshing to see that there are many ingredients with a high frequency in our dataset. This bodes well for our future analysis. However, there are some ingredients that only appear once. It looks like these are a combination of obscure or overly specific ingredients as well as mistakes by the tagging model.

In [129]:
ingred_dedup.loc[ingred_dedup.freq == 1].head(10)
Out[129]:
recipe_id name freq
83 3 Campbell's® Tomato Juice Tomato Juice 1
92 8 "1 jar chunky salsa 1
94 8 Add-Ins: 1
111 11 couscous rice 1
115 12 hot biscuits 1
134 16 PHILADELPHIA 1
224 28 corkscrew-shaped pasta 1
243 35 lemon-flavored soda 1
274 40 Lemon Buttercream: 1
282 41 square semisweet baking chocolate 1
In [130]:
len(ingred_dedup.name.value_counts())
Out[130]:
18394

There are 18,394 unique ingredients in our set. We will pare this down slightly to make it more manageable. Then we will create a concatenated column that has a comma separated string of every ingredient that appears in the recipe.

In [230]:
ingred_reduced = ingred_dedup.loc[ingred_dedup.freq >= 500].copy()
In [231]:
ingred_reduced['text'] = ingred_reduced[['recipe_id','name']].groupby(['recipe_id'])['name'].transform(lambda x: ','.join(x))

Using CountVectorizer in sklearn we can efficiently transform the dataframe into a recipe id by ingredient matrix. Each cell value contains a count of how many times an ingredient appeared in the recipe.

In [301]:
from sklearn.feature_extraction.text import CountVectorizer

tfidf = CountVectorizer(
    tokenizer=lambda x: x.lower().split(','),
    preprocessor=lambda x: x,
)  

x = tfidf.fit_transform(ingred_reduced.text)

To form this into an ingredient by ingredient matrix we will perform some simple matrix multiplication with the matrix transpose. We read this back into a dataframe and add in the feature names (ingredient names) for the index and columns making sure to make all entries on the diagonal 0. Finally, we will perform a modified normalization of the dataframe to force all the values to be positive. We need this condition in order to construct the network graph.

In [389]:
import numpy as np

y = x.T * x
final = pd.DataFrame(y.todense(), columns=tfidf.get_feature_names(), index=tfidf.get_feature_names())
final.head()
Out[389]:
"1 all-purpose flour allspice almond extract almonds apple apple cider vinegar apples applesauce bacon ... water wheat flour white pepper white sugar white vinegar white wine worcestershire sauce yellow onion yogurt zucchini
"1 5004 602 24 59 117 38 19 56 47 139 ... 888 0 56 679 63 10 176 112 30 87
all-purpose flour 602 110938 3598 3251 3384 760 500 3240 2915 2395 ... 27678 3530 1249 62558 2079 1192 2435 700 1301 2586
allspice 24 3598 8291 67 377 217 242 517 455 61 ... 2517 390 168 3504 417 67 314 73 198 126
almond extract 59 3251 67 5412 1221 35 36 118 113 11 ... 1341 61 0 3841 27 9 0 0 111 12
almonds 117 3384 377 1221 11565 207 232 268 243 270 ... 2605 282 145 4587 148 121 140 0 357 123

5 rows × 169 columns

In [390]:
final_norm = abs(final - final.values.mean()) / final.values.std()
final_norm.values[[np.arange(169)]*2] = 0
/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:2: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  

Our final dataset contains only 169 ingredients because of the ingredient reduction that we performed in the last section. Had we not performed that step we would have thousands of ingredients to visualize in a network graph. For a graph of this size, the centrality computations that we are interested in will be incredibly inefficient.

Graphical Analysis

I originally intended to spend the majority of my tutorial on this section, but considering the length requirements of this tutorial, I will keep my analysis brief.

Networkx is a network visualization package that allows you to build, render and run algorithms against network graphs. To build a graph from our final, ingredient by ingredient matrix, we simply read in the data frame and call a draw function.

In [391]:
import networkx
In [392]:
G = networkx.from_pandas_adjacency(final_norm)
In [395]:
networkx.draw_networkx(G, pos=networkx.drawing.layout.kamada_kawai_layout(G),
                       node_size=100, width=0.1)

As you can see from the giant smush of black, this graph is highly connected meaning that nearly every ingredient is used with every other ingredient in at least one recipe. We're more interested in common groupings of ingredients so we should simplify the graph to drop edges that fall below a certain threshold. For now, let's arbitrarily choose 10.

In [399]:
H = networkx.Graph()

edg =[(u,v) for (u,v,d) in G.edges(data=True) if d['weight'] > 10]
nd = [i for i,j in edg]

H.add_nodes_from(list(set(nd)))
H.add_edges_from(edg)
networkx.draw_networkx(H, pos=networkx.drawing.layout.kamada_kawai_layout(H),
                       font_size=8)

It should be no surprise that salt is at the center of this graph since it is ubiquitous. More intersting is that even when limited to 11 ingredients, we can see the divide between sweet and savory. The baking ingredients are highly interconnected and form their own community; they are joined to the savory ingredients only by salt.

The savory ingredients are more disjoint, with only salt, onion and garlic forming a triangle.

Let's perform this analysis again, but expanding our threshold parameter to see if the savory ingredients start to show a similar relationship.

In [416]:
H = networkx.Graph()

edg =[(u,v) for (u,v,d) in G.edges(data=True) if d['weight'] > 5]
nd = [i for i,j in edg]

H.add_nodes_from(list(set(nd)))
H.add_edges_from(edg)
In [417]:
from matplotlib import pyplot as plt

plt.figure(figsize=(100,100))
networkx.draw_networkx(H, pos=networkx.drawing.layout.kamada_kawai_layout(H),
                       font_size=100, node_size=10000)
plt.show()

The savory ingredients are starting to show the same relationship and now there are three bridges; salt, water and butter.

Remember, we are generating these graphs using a normalized ingredient by ingredient matrix. We can start to draw conclusions about the type of recipes included on Allrecipes. There seems to be a high concentration of baked goods as well as, dare I say, italian recipes.

Let's zoom out one more time just for fun.

In [418]:
I = networkx.Graph()

edg =[(u,v) for (u,v,d) in G.edges(data=True) if d['weight'] > 3]
nd = [i for i,j in edg]

I.add_nodes_from(list(set(nd)))
I.add_edges_from(edg)

plt.figure(figsize=(100,100))
networkx.draw_networkx(I, pos=networkx.drawing.layout.kamada_kawai_layout(I),
                       font_size=100, node_size=10000)
plt.show()

Closing Remarks

Users on AllRecipes tend to prefer simpler recipes that have less ingredients. Not only do users rate low ingredient recipes higher, but they also are more likely to attempt to cook them themselves.There are two clear themes for recipes: Baked goods and Italian savory meals.

This tutorial provides only a cursory look at the ingredient web. Curious readers are encouraged to explore the graph further by examining measures of centrality and cliques.